This directory contains the source code to create and update the database of TikTok videos. 

The files under ''Generate'' folder input 
1) the extracted video features from the text, image, audio and editing modalities 
2) the labels/values of the growth residuals left from the Bayesian estimation. 

The files under ''Output'' folder generates data from the database which include 
1) feature data for the train/test set used in our prediction models
2) feature data for the top/random 100 videos under each hashtag used in our content evolution analysis. 
3) feature data for the newly added videos on each day under each hashtag used in our content evolution analysis

Here is the database structure;

Hashtag1 --video1 --text   --description
.         .                --sticker text
.         .                --voice-over
.                 --image  --frame1 --VGG19 features
                            .       --object class features
                                    --visual sentiment features
                            .
                            .
                           --frameT(T=videoLength/3seconds)

                  --audio  --YAMNET (music classes)
                           --other music features (MPCC, pitch, etc..)
                           --music universality 

                  --editing--videoLength
                           --#stickers
                           --stickerLength
                           --temporal + spatial variation in object classes/visual sentiment/ YAMNET classes
                           --averageSceneLength
                           --aesthetic features

                  --labels --over-performing in growth rate (1 if residual_a>0)
                           --over-performing in starting impression (1 if residual_b>0)
                           --over-performing in both (1 if residual_a>0 & residual_b>0)

                  --residuals --growth rate (numeric)
.                             --starting impression (numeric)
.
.
HashtagN